The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval
نویسندگان
چکیده
For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some approaches for expanding the existing translation dictionaries. In this paper, instead of lexicon expansion, we explore whether using the context of the unknown terms can help mitigate the loss of meaning due to missing translation. Our approaches consist of two steps: (1) to identify terms that are closely associated with the unknown source language terms as context vectors and (2) to use the translations of the associated terms in the context vectors as the surrogate translations of the unknown terms. We describe a query-independent version and a query-dependent version using such monolingual context vectors. These methods are evaluated in Japanese-to-English retrieval using the NTCIR-3 topics and data sets. Empirical results show that both methods improved CLIR performance for short and medium-length queries and that the query-dependent context vectors performed better than the query-independent versions.
منابع مشابه
A Probabilistic Translation Method for Dictionary-based Cross-lingual Information Retrieval in Agglutinative Languages
Translation ambiguity, out of vocabulary words and missing some translations in bilingual dictionaries make dictionary-based Crosslanguage Information Retrieval (CLIR) a challenging task. Moreover, in agglutinative languages which do not have reliable stemmers, missing various lexical formations in bilingual dictionaries degrades CLIR performance. This paper aims to introduce a probabilistic tr...
متن کاملUsing Word Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval
Cross-Language Information Retrieval (CLIR) has become an important problem to solve in the recent years due to the growth of content in multiple languages in the Web. One of the standard methods is to use query translation from source to target language. In this paper, we propose an approach based on word embeddings, a method that captures contextual clues for a particular word in the source l...
متن کاملEnglish-Chinese Cross-Language Information Retrieval using Lucene Toolkit1
In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...
متن کاملQuery Translation for Cross-lingual Information Retrieval using Wikipedia
In this paper the system WikiTranslate is introduced that performs query translation for cross-lingual information retrieval (CLIR) that only uses Wikipedia. Queries will be mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics in Dutch, French and Spanish i...
متن کاملA System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages
This paper describes a system that automatically mines EnglishChinese translation pairs from large amount of monolingual Chinese web pages. Our approach is motivated by the observation that many Chinese terms (e.g., named entities that are not stored in a conventional dictionary) are accompanied by their English translations in the Chinese web pages. In our approach, candidate translations are ...
متن کامل